{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dh28jVtATQbX"
   },
   "source": [
    "# Reducing Hallucinations from AI Agents using Long-Term Memory\n",
    "### Introduction to Critique-Based Contexting with OpenAI, LangChain, and LanceDB\n",
    "AI agents can help simplify and automate tedious workflows. By going through this notebook, we'll introduce how you can reduce hallucinations from AI agents by using critique-based contexting with a fitness trainer agent example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qsnz2_w_TQbb"
   },
   "source": [
    "### Installing dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "40qw2MppTQbb"
   },
   "outputs": [],
   "source": [
    "!pip install openai==0.28 langchain==0.0.354 google-search-results lancedb python-dotenv -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "55YGOMBaTQbc"
   },
   "source": [
    "### Importing libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "0Hle9B9nTQbd"
   },
   "outputs": [],
   "source": [
    "from langchain.agents import load_tools\n",
    "from langchain.agents import initialize_agent\n",
    "from langchain.agents import AgentType\n",
    "from langchain.chat_models import ChatOpenAI"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "74QsuNO4TQbd"
   },
   "source": [
    "Now let's import our environment variables via `load_dotenv()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "IA4GyAbRTQbd",
    "outputId": "88591262-00c9-4fd9-afa2-f2629332a0c3"
   },
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BJ4iBqZ9TQbe"
   },
   "source": [
    "We now specify and connect to the path `data/agent-lancedb` to store our vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "id": "zwzrQpF-TQbe"
   },
   "outputs": [],
   "source": [
    "import lancedb\n",
    "\n",
    "db = lancedb.connect(\"data/agent-lancedb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "40GelwHLTQbf"
   },
   "source": [
    "To create embeddings out of the text, we'll call the OpenAI embeddings API (ada2 text embeddings model) to get embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "id": "jXfgGYh-TQbf"
   },
   "outputs": [],
   "source": [
    "import openai\n",
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-proj-....\"  # PASTE YOUR KEY HERE\n",
    "\n",
    "\n",
    "def embed_func(c):\n",
    "    rs = openai.Embedding.create(input=c, engine=\"text-embedding-ada-002\")\n",
    "    return [record[\"embedding\"] for record in rs[\"data\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "R5XypTB6TQbf"
   },
   "source": [
    "Now, we'll create a `LangChain` tool that allows our agent to insert critiques, which uses a `pydantic` schema to guide the agent on what kind of results to insert.\n",
    "\n",
    "In LanceDB the primary abstraction you'll use to work with your data is a **Table**.  \n",
    "A Table is designed to store large numbers of columns and huge quantities of data! For those interested, a LanceDB is columnar-based, and uses Lance, an open data format to store data.\n",
    "\n",
    "This tool will create a Table if it does not exist and store the relevant information (the embedding, actions, and critiques)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YfNVwoZ4TQbg"
   },
   "source": [
    "### Considering the image below\n",
    "1. ideation step gets a predefined number of output proposals (ideas) from the LLM.\n",
    "\n",
    "2. critiques all of the ideas looking for possible flaws in the proposals and picking the most appropriate suggestion.\n",
    "\n",
    "3. resolve, the LLM tries to improve the best idea from the [2] critique step. The output here constitutes the final answer.\n",
    "\n",
    "![image](https://miro.medium.com/v2/resize:fit:1400/1*dMfLxIR-s8osxkVUbORtGg.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "T42MpKkHTQbg"
   },
   "source": [
    "### Function to take input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "SxsPAHKXTQbg"
   },
   "outputs": [],
   "source": [
    "from langchain.tools import tool\n",
    "from pydantic import BaseModel, Field\n",
    "\n",
    "\n",
    "class InsertCritiquesInput(BaseModel):\n",
    "    info: str = Field(\n",
    "        description=\"should be demographics or interests or other information about the exercise request provided by the user\"\n",
    "    )\n",
    "    actions: str = Field(\n",
    "        description=\"numbered list of langchain agent actions taken (searched for, gave this response, etc.)\"\n",
    "    )\n",
    "    critique: str = Field(\n",
    "        description=\"negative constructive feedback on the actions you took, limitations, potential biases, and more\"\n",
    "    )\n",
    "\n",
    "\n",
    "# @tool(\"insert_critiques\", args_schema=InsertCritiquesInput)\n",
    "def insert_critiques(info: str, actions: str, critique: str) -> str:\n",
    "    \"Insert actions and critiques for similar exercise requests in the future.\" \"\"\n",
    "    table_name = \"exercise-routine\"\n",
    "    if table_name not in db.table_names():\n",
    "        tbl = db.create_table(\n",
    "            table_name,\n",
    "            [{\"vector\": embed_func(info)[0], \"actions\": actions, \"critique\": critique}],\n",
    "        )\n",
    "    else:\n",
    "        tbl = db.open_table(table_name)\n",
    "        tbl.add(\n",
    "            [{\"vector\": embed_func(info)[0], \"actions\": actions, \"critique\": critique}]\n",
    "        )\n",
    "    return \"Inserted and done.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "JeuYUpFaTQbg"
   },
   "source": [
    "Similarly, let's create a tool for retrieving critiques. We'll retrieve the actions and critiques from the top 5 most similar user inputs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Eqdy1EPhTQbg"
   },
   "source": [
    "### Retrieve Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "vOcLmd5aTQbg"
   },
   "outputs": [],
   "source": [
    "class RetrieveCritiquesInput(BaseModel):\n",
    "    query: str = Field(\n",
    "        description=\"should be demographics or interests or other information about the exercise request provided by the user\"\n",
    "    )\n",
    "\n",
    "\n",
    "# @tool(\"retrieve_critiques\", args_schema=RetrieveCritiquesInput)\n",
    "def retrieve_critiques(query: str) -> str:\n",
    "    \"Retrieve actions and critiques for similar exercise requests.\" \"\"\n",
    "    table_name = \"exercise-routine\"\n",
    "    if table_name in db.table_names():\n",
    "        tbl = db.open_table(table_name)\n",
    "        results = (\n",
    "            tbl.search(embed_func(query)[0])\n",
    "            .limit(5)\n",
    "            .select([\"actions\", \"critique\"])\n",
    "            .to_pandas()\n",
    "        )\n",
    "        results_list = results.drop(\"vector\", axis=1).values.tolist()\n",
    "        return (\n",
    "            \"Continue with the list with relevant actions and critiques which are in the format [[action, critique], ...]:\\n\"\n",
    "            + str(results_list)\n",
    "        )\n",
    "    else:\n",
    "        return \"No info, but continue.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MbwyaX8TTQbg"
   },
   "source": [
    "Let's now use LangChain to load our tools in. This includes our custom tools as well as a Google Search tool that uses SerpApi. We will use OpenAI's `gpt-3.5-turbo-0613` as our LLM.\n",
    "Note: You need to sign up on serpapi and paste the API key here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "id": "ChNa7HxiTQbh"
   },
   "outputs": [],
   "source": [
    "os.environ[\"SERPAPI_API_KEY\"] = \".....\"\n",
    "\n",
    "llm = ChatOpenAI(temperature=0, model=\"gpt-4o\")\n",
    "tools = load_tools([\"serpapi\"], llm=llm)\n",
    "tools.extend([insert_critiques, retrieve_critiques])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vHimH62LTQbh"
   },
   "source": [
    "Before we run our agent, let's create a function that defines our prompt that we pass in to the agent, which allows us to pass in client information."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "VqFrb6FpTQbh"
   },
   "source": [
    "### Create Prompt Function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "id": "k7TTbWIYTQbh"
   },
   "outputs": [],
   "source": [
    "from langchain.agents import initialize_agent, AgentType\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.tools import Tool\n",
    "\n",
    "\n",
    "def create_prompt(info: str) -> str:\n",
    "    prompt_start = (\n",
    "        \"Please execute actions as a fitness trainer based on the information about the user and their interests below.\\n\\n\"\n",
    "        + \"Info from the user:\\n\\n\"\n",
    "    )\n",
    "    prompt_end = (\n",
    "        \"\\n\\n1. Retrieve using user info and review the past actions and critiques if there is any\\n\"\n",
    "        + \"2. Keep past actions and critiques in mind while researching for an exercise routine with steps which we respond to the user\\n\"\n",
    "        + \"3. Before returning the response, it is of upmost importance to insert the actions you took (numbered list: searched for, found this, etc.) and critiques (negative feedback: limitations, potential biases, and more) into the database for getting better exercise routines in the future. \\n\"\n",
    "    )\n",
    "    return prompt_start + info + prompt_end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uWuMPzLvTQbh"
   },
   "source": [
    "### Run Agent\n",
    "\n",
    "Finally, let's create our run_agent function. We'll use the `STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION` agent in order to allow us to use multi-input tools (since we need to add client input, actions, and critiques as arguments)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "id": "O0B_ZtUqTQbh"
   },
   "outputs": [],
   "source": [
    "def run_agent(info):\n",
    "    # Change agent= to agent_type=\n",
    "    agent = initialize_agent(\n",
    "        tools,\n",
    "        llm,\n",
    "        agent_type=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
    "        verbose=True,\n",
    "    )\n",
    "    # Add a return or print to see the result\n",
    "    return agent.run(input=create_prompt(info))\n",
    "\n",
    "\n",
    "# You'll need to define these before running\n",
    "# Example placeholder definitions\n",
    "tools = [\n",
    "    Tool.from_function(\n",
    "        func=lambda x: \"Sample fitness advice\",\n",
    "        name=\"fitness_tool\",\n",
    "        description=\"Provides fitness advice based on user information\",\n",
    "    )\n",
    "]\n",
    "llm = ChatOpenAI(temperature=0)  # You may need to set up API keys"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-vwCYjXvTQbh"
   },
   "source": [
    "Let's run. Feel free to use your own input!\n",
    "\n",
    "Notice that in the first run there wouldn't be any critiques yet, since the database is empty. After the first run, critiques should appear. The provided output is the result of a particular run after a few runs."
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "# Now try running the agent\n",
    "result = run_agent(\n",
    "    \"My name is Tevin, I'm a 19 year old university student at CMU. I love running.\"\n",
    ")"
   ],
   "metadata": {
    "id": "glidPlSx-cu8",
    "outputId": "fe4358b0-1d58-4ef5-d5f5-6d791317bc3c",
    "colab": {
     "base_uri": "https://localhost:8080/"
    }
   },
   "execution_count": 24,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3mI should gather information about Tevin's fitness level, running experience, and any specific goals he may have.\n",
      "Action: fitness_tool\n",
      "Action Input: Tevin's age, university, love for running\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mSample fitness advice\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3mI should now tailor an exercise routine for Tevin based on his love for running and fitness level.\n",
      "Action: fitness_tool\n",
      "Action Input: Tevin's running experience, fitness goals\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mSample fitness advice\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3mI should make sure to include a variety of exercises to keep Tevin engaged and prevent boredom.\n",
      "Action: fitness_tool\n",
      "Action Input: Tevin's preferred types of exercises\u001b[0m\n",
      "Observation: \u001b[36;1m\u001b[1;3mSample fitness advice\u001b[0m\n",
      "Thought:\u001b[32;1m\u001b[1;3mI now know the final answer\n",
      "Final Answer: I have tailored an exercise routine for Tevin based on his love for running, fitness level, running experience, and fitness goals.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    }
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "print(result)"
   ],
   "metadata": {
    "id": "SXi3-pKz8VKS",
    "outputId": "caa18a2a-00f6-49ce-b9cc-d2e6ab01bed7",
    "colab": {
     "base_uri": "https://localhost:8080/"
    }
   },
   "execution_count": 25,
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "I have tailored an exercise routine for Tevin based on his love for running, fitness level, running experience, and fitness goals.\n"
     ]
    }
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.1"
  },
  "colab": {
   "provenance": []
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}